Real time robust localization via visual inertial odometry

ABSTRACT

A camera-based localization system is provided. The camera-based localization system may assist an unmanned vehicle to continue its operation in a GPS-denied environment with minimal increase in vehicular cost and payload. In one aspect, a method, a computer-readable medium, and an apparatus for localization via visual inertial odometry are provided. The apparatus may construct an optical flow based on feature points across a first video frame and a second video frame captured by a camera of the apparatus. The apparatus may refine the angular velocity and the linear velocity corresponding to the second video frame via solving a quadratic optimization problem constructed based on the optical flow, the initial values of the angular velocity and the linear velocity corresponding to the second video frame. The apparatus may estimate the pose of the apparatus based on the refined angular velocity and the refined linear velocity.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims the benefit of Singapore Patent Application No. 10201702561Y, entitled “Real Time Robust Localization via Visual Inertial Odometry” and filed on Mar. 29, 2017, which is expressly incorporated by reference herein in its entirety.

TECHNICAL FIELD

Various aspects of this disclosure generally relate to localization systems, and more particularly, to a camera-based localization system.

BACKGROUND

It is important for a moving device (e.g., robot, unmanned vehicle, etc.) to be able to navigate in its environment. The moving device may be able to determine its own position and then to plan a path towards a goal location. Localization enables the moving device to establish its own position and orientation within its frame of reference.

An inertial measurement unit (IMU) is an electronic device that measures and reports a body's specific force, angular rate, and sometimes the magnetic field surrounding the body, using a combination of accelerometers and gyroscopes, sometimes also magnetometers. IMUs are typically used to maneuver aircraft, including unmanned aerial vehicles (UAVs), among many others, and spacecraft, including satellites. An IMU allows a GPS receiver to work when GPS-signals are unavailable, such as in tunnels, inside buildings, or when electronic interference is present.

Visual odometry is the process of determining the position and orientation of a moving device by analyzing the associated camera images. If an inertial measurement unit is used within the visual odometry system, it is commonly referred to as visual inertial odometry (VIO).

SUMMARY

The following presents a simplified summary in order to provide a basic understanding of various aspects of the disclosed invention. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. The sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

In one aspect of the disclosure, a method, a computer-readable medium, and an apparatus for localization via visual inertial odometry are provided. The apparatus may determine a first linear velocity of the apparatus and a first rotation matrix corresponding to a first video frame captured by a camera of the apparatus. The apparatus may estimate a second linear velocity of the apparatus corresponding to a second video frame captured by the camera based on the first linear velocity, the first rotation matrix, and an initial angular velocity of the apparatus corresponding to the second video frame. The angular velocity corresponding to the second video frame may be provided by an inertial measurement unit of the apparatus. The apparatus may construct an optical flow based on feature points across the first video frame and the second video frame. The apparatus may refine the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity provided by the inertial measurement unit. The apparatus may estimate a pose of the apparatus corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity corresponding to the second video frame.

In some embodiments, the apparatus may reduce the drift of the estimated pose corresponding to the second video frame using the nearest key frame of the second video frame. In such embodiments, to reduce the drift of the estimated pose, the apparatus may calculate initial values of a second rotation matrix and translation vector of the second video frame, select the nearest key frame of the second video frame, and refine the pose corresponding to the second video frame using feature points in a common captured region between the second video frame and the nearest key frame.

To the accomplishment of the foregoing and related ends, the aspects disclosed include the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail illustrate certain features of the aspects of the disclosure. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating the overall framework of the camera-based localization system of some embodiments.

FIG. 2 is a flowchart of a method of localization via visual inertial odometry.

FIG. 3 is a flowchart of a method of reducing the drift of an estimated pose corresponding to a current video frame.

FIG. 4 is a conceptual data flow diagram illustrating the data flow between different means/components in an exemplary apparatus.

FIG. 5 is a diagram illustrating an example of a hardware implementation for an apparatus employing a processing system.

DETAILED DESCRIPTION

The detailed description set forth below in connection with the appended drawings is intended as a description of various possible configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.

Several aspects of a camera-based localization system will now be presented with reference to various apparatus and methods. The apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, components, circuits, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.

By way of example, an element, or any portion of an element, or any combination of elements may be implemented as a “processing system” that includes one or more processors. Examples of processors include microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

Accordingly, in one or more example embodiments, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media may include a random-access memory (RAM), a read-only memory (ROM), an electrically erasable programmable ROM (EEPROM), optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.

In the disclosure, a camera-based localization system is provided. The camera-based localization system may assist an unmanned vehicle to continue its operation in a GPS-denied environment with minimal increase in vehicular cost and payload. The camera-based localization system is based on the fusion of visual signal and inertial measurement unit (IMU) data. The key aspect of the system is an optical flow based fast visual inertial odometry (VIO) method. Unlike traditional VIO techniques, photo-consistency may not be needed in the camera-based localization system.

FIG. 1 is a diagram 100 illustrating the overall framework of the camera-based localization system of some embodiments. In some embodiments, the camera-based localization system may be part of an unmanned vehicle. In some embodiments, the camera-based localization system may be performed by an apparatus (e.g., the apparatus 402/402′ described below with reference to FIG. 4 or FIG. 5).

At 102, the apparatus may fuse information of the (k−1)th frame via an adaptive extended Kalman filter (EKF). The (k−1)th frame may be a video frame captured by a camera associated with the apparatus. As a result, the apparatus may obtain the pose of the (k−1)th frame, as well as the linear velocity, rotation matrix, and height of the (k−1)th frame.

In some embodiments, the pose of an object may be a combination of position and orientation of the object. In some embodiments, the pose of an object may be the position of the object. In some embodiments, a variable (e.g., linear velocity) of a particular frame (e.g., kth frame) may denote the value of the variable regarding the apparatus at the time instance of the particular frame being captured by the camera.

At 104, the apparatus may estimate the linear velocity of the kth frame. The kth frame may be a video frame captured by the camera subsequent to the (k−1)th frame. In some embodiments, the linear velocity of the kth frame may be estimated based on the linear velocity, rotation matrix, and height of the (k−1)th frame, as well as the angular velocity and linear acceleration of the kth frame. The angular velocity and linear acceleration of the kth frame may be provided by an IMU 120.

At 106, the apparatus may refine the angular velocity and the linear velocity of the kth frame using the (k−1)th frame and the kth frame. In some embodiments, the angular velocity and linear velocity of the kth frame may be refined based on the linear velocity of the kth frame (estimated at 104), the angular velocity of the kth frame (provided by the IMU 120), and the height of the kth frame (provided by a height sensor 122).

At 108, the apparatus may estimate the initial pose of the kth frame. In some embodiments, the initial pose of the kth frame may be estimated based on the pose of the (k−1)th frame (obtained at 102), the refined angular velocity and linear velocity of the kth frame (obtained at 106), and data provided by the IMU 120.

At 110, the apparatus may reduce the drift of the kth frame using the nearest key frame. In some embodiments, the drift of the kth frame may be reduced based on the initial pose of the kth frame (obtained at 108) and the height of the kth frame (provided by the height sensor 122). As a result of the drift reduction, the apparatus may obtain the rotation matrix and the translation vector of the kth frame.

At 112, the apparatus may fuse information of the kth frame via an adaptive EKF. The information being fused may include the rotation matrix and the translation vector of the kth frame (obtained at 110), the height of the kth frame (provided by the height sensor 122), and the information of the kth frame (provided by the IMU 120). Consequently, the apparatus may obtain the linear velocity, rotation matrix, translation vector, and height of the kth frame.

In some embodiments, a discrete dynamic model is developed using matrix theory and linear control system theory. In one embodiment, the model may represent a dynamic model of the unmanned aerial vehicle (UAV). The model may be used to estimate initial values of optimized variables.

Let V=[V_(x) V_(y) V_(z)]^(T) be the linear velocity in the world frame and R=[R₁ R₂ R₃] be the rotation matrix from the world reference frame to the body frame. Their dynamics is represented as

$\begin{matrix} {{\begin{bmatrix} \overset{.}{V} \\ \overset{.}{R_{1}} \\ \overset{.}{R_{2}} \\ \overset{.}{R_{3}} \end{bmatrix} = {\begin{bmatrix} {R^{T}a} \\ {{- {{sk}(\omega)}}R_{1}} \\ {{- {{sk}(\omega)}}R_{2}} \\ {{- {{sk}(\omega)}}R_{3}} \end{bmatrix} - \begin{bmatrix} g \\ 0 \\ 0 \\ 0 \end{bmatrix}}},} & (1) \end{matrix}$

where a=[a₁ a₂ a₃]^(T) is the linear acceleration in the body frame that is measured with an accelerometer (e.g., in the IMU 120), and g=[0 0 g_(w)] with g_(w) being the gravity acceleration in the world frame. ω=[ω_(x) ω_(y) ω_(z)]^(T) is the angular velocity of the UAV that is measured by a gyroscope (e.g., in the IMU 120), and sk(ω) is defined as

$\begin{matrix} {{{sk}(\omega)} = {\begin{bmatrix} 0 & {- \omega_{z}} & \omega_{y} \\ \omega_{z} & 0 & {- \omega_{x}} \\ {- \omega_{y}} & \omega_{x} & 0 \end{bmatrix}.}} & (2) \end{matrix}$

Using the matrix theory, it can be derived that

$\begin{matrix} {{R^{T}a} = {\begin{bmatrix} {R_{1}^{T}a} \\ {R_{2}^{T}a} \\ {R_{3}^{T}a} \end{bmatrix} = {\begin{bmatrix} {a^{T}R_{1}} \\ {a^{T}R_{2}} \\ {a^{T}R_{3}} \end{bmatrix}.}}} & (3) \end{matrix}$

It follows that

$\begin{matrix} {{\begin{bmatrix} \overset{.}{V} \\ \overset{.}{R_{1}} \\ \overset{.}{R_{2}} \\ \overset{.}{R_{3}} \end{bmatrix} = {{{A\left( {a,\omega} \right)}\begin{bmatrix} V \\ R_{1} \\ R_{2} \\ R_{3} \end{bmatrix}} - \begin{bmatrix} g \\ 0 \\ 0 \\ 0 \end{bmatrix}}},} & (4) \end{matrix}$

where the matrix A(a, ω) is given by

$\begin{matrix} {{{A\left( {a,\omega} \right)} = \begin{bmatrix} 0 & {{\hat{A}}_{1}(a)} & {{\hat{A}}_{2}(a)} & {{\hat{A}}_{3}(a)} \\ 0 & {- {{sk}(\omega)}} & 0 & 0 \\ 0 & 0 & {- {{sk}(\omega)}} & 0 \\ 0 & 0 & 0 & {- {{sk}(\omega)}} \end{bmatrix}},{{{\hat{A}}_{1}(a)} = \begin{bmatrix} a_{1} & a_{2} & a_{3} \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}},{{{\hat{A}}_{2}(a)} = \begin{bmatrix} 0 & 0 & 0 \\ a_{1} & a_{2} & a_{3} \\ 0 & 0 & 0 \end{bmatrix}},{{{\hat{A}}_{3}(a)} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ a_{1} & a_{2} & a_{3} \end{bmatrix}}} & (5) \end{matrix}$

When the values of a and ω are fixed, it can easily derived that

$\begin{matrix} {{\exp^{A{({a,\omega})}} = \begin{bmatrix} I & {{\overset{\sim}{A}}_{1}\left( {a,\omega} \right)} & {{\overset{\sim}{A}}_{2}\left( {a,\omega} \right)} & {{\overset{\sim}{A}}_{3}\left( {a,\omega} \right)} \\ 0 & {E(\omega)} & 0 & 0 \\ 0 & 0 & {E(\omega)} & 0 \\ 0 & 0 & 0 & {E(\omega)} \end{bmatrix}},{{{\overset{\sim}{A}}_{1}\left( {a,\omega} \right)} = \begin{bmatrix} \Gamma^{T} & {(\omega)a} & 0 & 0 \end{bmatrix}^{T}},{{{\overset{\sim}{A}}_{2}\left( {a,\omega} \right)} = \begin{bmatrix} 0 & \Gamma^{T} & {(\omega)a} & 0 \end{bmatrix}^{T}},{{{\overset{\sim}{A}}_{3}\left( {a,\omega} \right)} = \begin{bmatrix} 0 & 0 & \Gamma^{T} & {(\omega)a} \end{bmatrix}^{T}},} & (6) \end{matrix}$

where I is a 3×3 identity matrix, and the matrix E(ω) is given as

$\begin{matrix} {{{E(\omega)} = {I - {\frac{\sin {\omega }}{\omega }{{sk}(\omega)}} + {\frac{1 - {\cos {\omega }}}{{\omega }^{2}}\left( {{sk}(\omega)} \right)^{2}}}}{{\omega } = \sqrt{\omega_{x}^{2} + \omega_{y}^{2} + \omega_{z}^{2}}}} & (7) \end{matrix}$

and the matrix Γ(ω) is given by

$\begin{matrix} {{\Gamma (\omega)} = {I + {\frac{{\cos {\omega }} - 1}{{\omega }^{2}}{{sk}(\omega)}} + {\frac{{\omega } - {\sin {\omega }}}{{\omega }^{3}}{\left( {{sk}(\omega)} \right)^{2}.}}}} & (8) \end{matrix}$

Consider two discrete time instances k and (k+1) with the interval as ΔT, it follows from the linear control system theory that

$\begin{matrix} {\begin{bmatrix} {V\left( {k + 1} \right)} \\ {R_{1}\left( {k + 1} \right)} \\ {R_{2}\left( {k + 1} \right)} \\ {R_{3}\left( {k + 1} \right)} \end{bmatrix} = {{\exp^{A{({{\Delta \; {Ta}},{\Delta \; T\; \omega}})}}\begin{bmatrix} {V(k)} \\ {R_{1}(k)} \\ {R_{2}(k)} \\ {R_{3}(k)} \end{bmatrix}} - {\begin{bmatrix} {\Delta \; {Tg}} \\ 0 \\ 0 \\ 0 \end{bmatrix}.}}} & (9) \end{matrix}$

The dynamics of the camera center is now addressed. Let P_(c)=[X_(c) Y_(c) Z_(c)]^(T) be the position of the camera center in the world reference frame. The dynamics of P_(c) is modelled as

{dot over (P)} _(c) =V.  (10)

Combining with the equation (4), it can be obtained that

$\begin{matrix} {{\begin{bmatrix} {\overset{.}{P}}_{c} \\ \overset{.}{V} \\ \overset{.}{R_{1}} \\ \overset{.}{R_{2}} \\ \overset{.}{R_{3}} \end{bmatrix} = {{{B\left( {1,a,\omega} \right)}\begin{bmatrix} P_{c} \\ V \\ R_{1} \\ R_{2} \\ R_{3} \end{bmatrix}} - \begin{bmatrix} 0 \\ g \\ 0 \\ 0 \\ 0 \end{bmatrix}}},} & (11) \end{matrix}$

where the matrix B (s, a, ω) is given by

$\begin{matrix} {{B\left( {s,a,\omega} \right)} = {\begin{bmatrix} 0 & {sI} & 0 & 0 & 0 \\ 0 & 0 & {{\hat{A}}_{1}(a)} & {{\hat{A}}_{2}(a)} & {{\hat{A}}_{3}(a)} \\ 0 & 0 & {- {{sk}(\omega)}} & 0 & 0 \\ 0 & 0 & 0 & {- {{sk}(\omega)}} & 0 \\ 0 & 0 & 0 & 0 & {- {{sk}(\omega)}} \end{bmatrix}.}} & (12) \end{matrix}$

Similarly, using the matrix theory, it can be derived that

$\begin{matrix} {{\exp^{B{({s,a,\omega})}} = \begin{bmatrix} I & {sI} & {{\overset{\sim}{B}}_{1}\left( {s,a,\omega} \right)} & {{\overset{\sim}{B}}_{2}\left( {s,a,\omega} \right)} & {{\overset{\sim}{B}}_{3}\left( {s,a,\omega} \right)} \\ 0 & \; & \exp^{A{({a,\omega})}} & \; & \; \end{bmatrix}},} & (13) \end{matrix}$

where the matrices {tilde over (B)}_(i)(s, a, ω)(i=1, 2, 3) are given as

{tilde over (B)} ₁(s,a,ω)=[sΛ ^(T)(ω)a00]^(T),

{tilde over (B)} ₂(s,a,ω)=[0sΛ ^(T)(ω)a0]^(T),

{tilde over (B)} ₃(s,a,ω)=[00sΛ ^(T)(ω)a]^(T),  (14)

and the matrix Λ(ω) is given by

$\begin{matrix} {{\Lambda (\omega)} = {I + {\frac{{\sin {\omega }} - {\omega }}{{\omega }^{3}}{{sk}(\omega)}} + {\frac{{\cos {\omega }} - 1 + \frac{{\omega }^{2}}{2}}{{\omega }^{4}}{\left( {{sk}(\omega)} \right)^{2}.}}}} & (15) \end{matrix}$

Using the linear control system theory, a discrete dynamic model is provided as

$\begin{matrix} {\begin{bmatrix} {P_{c}\left( {k + 1} \right)} \\ {V\left( {k + 1} \right)} \\ {R_{1}\left( {k + 1} \right)} \\ {R_{2}\left( {k + 1} \right)} \\ {R_{3}\left( {k + 1} \right)} \end{bmatrix} = {{\exp^{B{({{\Delta \; T},{\Delta \; {Ta}},{\Delta \; t\; \omega}})}}\begin{bmatrix} {P_{c}(k)} \\ {V(k)} \\ {R_{1}(k)} \\ {R_{2}(k)} \\ {R_{3}(k)} \end{bmatrix}} - {\begin{bmatrix} {\frac{\Delta \; T^{2}}{2}g} \\ {\Delta \; {Tg}} \\ 0 \\ 0 \\ 0 \end{bmatrix}.}}} & (16) \end{matrix}$

The discrete dynamic model in the equation (16) is used to pre-integrate all the IMU data between two video frames. For simplicity, the current frame is assumed be the (m+1)th frame and the previous one is mth frame. Suppose that there are n IMU data between the two successive video frames.

Using the dynamic model in the equation (16), it can be obtained that

$\begin{matrix} {\mspace{79mu} {{{R_{i}\left( {k_{m} + 1} \right)} = {{E\left( {\Delta \; {T\left( k_{m} \right)}{\omega \left( k_{m} \right)}} \right)}{R_{i}\left( k_{m} \right)}}},{{V_{i}\left( {k_{m} + 1} \right)} = {{V_{i}\left( k_{m} \right)} + {\Delta \; {T\left( k_{m} \right)}\left( {{{a^{T}\left( k_{m} \right)}{\Gamma \left( {\Delta \; {T\left( k_{m} \right)}{\omega \left( m_{m} \right)}} \right)}{R_{i}\left( k_{m} \right)}} - g_{i}} \right)}}},{{P_{c,i}\left( {k_{m} + 1} \right)} = {{P_{c,i}\left( k_{m} \right)} + {\Delta \; {T\left( k_{m} \right)}{V_{i}\left( k_{m} \right)}} + {\Delta \; {T^{2}\left( k_{m} \right)}\left( {{{a^{T}\left( k_{m} \right)}{\Lambda \left( {\Delta \; {T\left( k_{m} \right)}{\omega \left( k_{m} \right)}} \right)}{R_{i}\left( k_{m} \right)}} - \frac{g_{i}^{2}}{2}} \right)}}},}} & (17) \end{matrix}$

where the value of k_(m) is mn. Subsequently, it can be derived that

$\begin{matrix} {{{R_{i}\left( k_{m + 1} \right)} = {\prod\limits_{j = 0}^{n - 1}{{E\left( {\Delta \; {T\left( {k_{m} + j} \right)}{\omega \left( {k_{m} + j} \right)}} \right)}{R_{i}\left( k_{m} \right)}}}},{{P_{c,i}\left( k_{m + 1} \right)} = {{P_{c,i}\left( k_{m} \right)} - {\Theta_{i}\left( k_{m} \right)}}},} & (18) \end{matrix}$

and the value of Θ_(i) (k_(m)) is computed using the equation (17).

Using the Lie algebra, it can be shown that there exists a {tilde over (ω)}=[{tilde over (ω)}₁ {tilde over (ω)}₂ {tilde over (ω)}₃]^(T) such that

$\begin{matrix} {{E\left( \overset{\sim}{\omega} \right)} = {\prod\limits_{j = 0}^{n - 1}{{E\left( {\Delta \; {T\left( {k_{m} + j} \right)}{\omega \left( {k_{m} + j} \right)}} \right)}.}}} & (19) \end{matrix}$

{tilde over (P)}_(i)(k_(m))=[{tilde over (X)}_(i)(k_(m)) {tilde over (Y)}_(i) (k_(m)) {tilde over (Z)}_(i)(k_(m))]^(T) is the coordinate of the corresponding pixel in the mth body frame. [X_(i) (k_(m)) Y_(i)(k_(m)) Z_(i)(k_(m))]^(T) is the coordinate of the pixel in the world reference frame. Their relationship is represented by

$\begin{matrix} {\begin{bmatrix} {{\overset{\sim}{X}}_{i}\left( k_{m} \right)} \\ {{\overset{\sim}{Y}}_{i}\left( k_{m} \right)} \\ {{\overset{\sim}{Z}}_{i}\left( k_{m} \right)} \end{bmatrix} = {{R\left( k_{m} \right)}{\left( {\begin{bmatrix} {X_{i}\left( k_{m} \right)} \\ {Y_{i}\left( k_{m} \right)} \\ {Z_{i}\left( k_{m} \right)} \end{bmatrix} - \begin{bmatrix} {X_{c}\left( k_{m} \right)} \\ {Y_{c}\left( k_{m} \right)} \\ {Z_{c}\left( k_{m} \right)} \end{bmatrix}} \right).}}} & (20) \end{matrix}$

It follows that

$\begin{matrix} {{\begin{bmatrix} {{\overset{\sim}{X}}_{i}\left( k_{m + 1} \right)} \\ {{\overset{\sim}{Y}}_{i}\left( k_{m + 1} \right)} \\ {{\overset{\sim}{Z}}_{i}\left( k_{m + 1} \right)} \end{bmatrix} = {{{E\left( \overset{\sim}{\omega} \right)}\begin{bmatrix} {{\overset{\sim}{X}}_{i}\left( k_{m} \right)} \\ {{\overset{\sim}{Y}}_{i}\left( k_{m} \right)} \\ {{\overset{\sim}{Z}}_{i}\left( k_{m} \right)} \end{bmatrix}} - {{E\left( \overset{\sim}{\omega} \right)}\overset{\sim}{\Theta}}}},} & (21) \end{matrix}$

where the value of {tilde over (Θ)} is computed as

{tilde over (Θ)}=R(k _(m))Θ(k _(m))  (22)

It is shown in the equation (21) that the rigid-body transformation matrix is determined by {tilde over (ω)} and {tilde over (Θ)}. In some embodiments, {tilde over (ω)} and {tilde over (Θ)} may be the angular velocity and linear velocity, respectively. {tilde over (ω)} and {tilde over (Θ)} will be the optimized variable in a nonlinear optimization problem described below. A nonlinear optimization problem may be converted into a quadratic optimization problem if the initial values of the optimized variables are available. The initial values of {tilde over (ω)} and {tilde over (Θ)} are given in the equations (19) and (22), respectively. Thus, the motion pre-integration via the IMU data may be used to determine the initial values of {tilde over (ω)} and {tilde over (Θ)}. In some embodiments, the initial values of {tilde over (ω)} and {tilde over (Θ)} may be determined at 104.

Refinement of {tilde over (ω)} and {tilde over (Θ)} may then be performed, using optical flow of two successive frames. The refinement may be divided into two steps. In the first step, optical flow of each feature point is estimated using for example, a conventional scale-invariant feature transform (SIFT) based method. In the second step, the values of {tilde over (ω)} and {tilde over (Θ)} of the current frame may be refined via solving an optimization problem which is built up based on the optical flow of all the feature points.

Let p_(i)=[u_(i)v_(i)]^(T) be the ith feature point in the mth frame and the relationship between p_(i) and {tilde over (P)}_(i)(k_(m)) is represented as

$\begin{matrix} {{\begin{bmatrix} u_{i} \\ v_{i} \end{bmatrix} = {{\pi \left( {{\overset{\sim}{P}}_{i}\left( k_{m} \right)} \right)} = \begin{bmatrix} {{f_{x}\frac{{\overset{\sim}{X}}_{i}\left( k_{m} \right)}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} + c_{x}} \\ {{f_{y}\frac{{\overset{\sim}{Y}}_{i}\left( k_{m} \right)}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} + c_{y}} \end{bmatrix}}},} & (23) \end{matrix}$

where f_(x) and f_(y) are the focal lengths of the camera in pixels, and c_(x) and c_(y) are the coordinates of the principle point in pixels. If the value of {tilde over (Z)}_(i)(k_(m)) is available, the coordinate {tilde over (P)}_(i)(k_(m)) is given as

$\begin{matrix} {{{\overset{\sim}{P}}_{i}\left( k_{m} \right)} = {{\pi^{- 1}\left( {p_{i},{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} \right)} = {\begin{bmatrix} {\frac{u_{i} - c_{x}}{f_{x}}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} & {\frac{v_{i} - c_{y}}{f_{y}}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} & {{\overset{\sim}{Z}}_{i}\left( k_{m} \right)} \end{bmatrix}^{T}.}}} & (24) \end{matrix}$

Let [{circumflex over (X)}_(i)(k_(m)) Ŷ_(i)(k_(m)) {circumflex over (Z)}_(i)(k_(m))]^(T) be defined as

$\begin{matrix} {\begin{bmatrix} {{\hat{X}}_{i}\left( k_{m} \right)} \\ {{\hat{Y}}_{i}\left( k_{m} \right)} \\ {{\hat{Z}}_{i}\left( k_{m} \right)} \end{bmatrix} = {\begin{bmatrix} {X_{i}\left( k_{m} \right)} \\ {Y_{i}\left( k_{m} \right)} \\ {Z_{i}\left( k_{m} \right)} \end{bmatrix} - {\begin{bmatrix} {X_{c}\left( k_{m} \right)} \\ {Y_{c}\left( k_{m} \right)} \\ {Z_{c}\left( k_{m} \right)} \end{bmatrix}.}}} & (25) \end{matrix}$

It can be derived that

$\begin{matrix} {\begin{bmatrix} {{\hat{X}}_{i}\left( k_{m} \right)} \\ {{\hat{Y}}_{i}\left( k_{m} \right)} \\ {{\hat{Z}}_{i}\left( k_{m} \right)} \end{bmatrix} = {{\begin{bmatrix} {R_{11}\left( k_{m} \right)} & {R_{21}\left( k_{m} \right)} & {R_{31}\left( k_{m} \right)} \\ {R_{12}\left( k_{m} \right)} & {R_{22}\left( k_{m} \right)} & {R_{32}\left( k_{m} \right)} \\ {R_{13}\left( k_{m} \right)} & {R_{23}\left( k_{m} \right)} & {R_{33}\left( k_{m} \right)} \end{bmatrix}\begin{bmatrix} {\frac{u_{i} - c_{x}}{f_{x}}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} \\ {\frac{v_{i} - c_{y}}{f_{y}}{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)}} \\ {{\overset{\sim}{Z}}_{i}\left( k_{m} \right)} \end{bmatrix}}.}} & (26) \end{matrix}$

From the third row of the equation (26), the value of {tilde over (Z)}_(i)(k_(m)) is computed as

$\begin{matrix} {{{{\overset{\sim}{Z}}_{i}\left( k_{m} \right)} = \frac{{\hat{Z}}_{i}\left( k_{m} \right)}{{{R_{13}\left( k_{m} \right)}\frac{u_{i} - c_{x}}{f_{x}}} + {{R_{23}\left( k_{m} \right)}\frac{v_{i} - c_{y}}{f_{y}}} + {R_{33}\left( k_{m} \right)}}},} & (27) \end{matrix}$

where the value of {tilde over (Z)}_(i)(k_(m)) is equal to the height of the UAV, and its value can be obtained from the sensor fusion output of the previous frame. Thus, it is assumed to be available. The coordinate {tilde over (P)}_(i)(k−1) may then be computed as in the equation (24).

Let the accurate optical flow at the pixel (u_(i),v_(i)) be (du_(i)*,dv_(i)*). It is assumed that the values of (du_(i)*,dv_(i)*) are obtained via a traditional robust method. It should be pointed out that the traditional method may be refined by using the initial values of {tilde over (ω)} and {tilde over (Θ)}. For example, a multi-scale algorithm may be simplified as a single-scale one. A new optical flow based matching error between two images I_(m) and I_(m+1) at the pixel (u_(i),v_(i)) of the image I_(m) is introduced as

δI _(i)({tilde over (ω)},{tilde over (Θ)})=(du _(i) −du _(i)*)²+(dv _(i) −dv _(i)*)²,  (28)

where du_(i) and dv_(i) are computed as

$\begin{matrix} {{{{du}_{i} = {{f_{x}\frac{{E_{1}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{1}\overset{\sim}{\Theta}}}{{E_{3}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}\overset{\sim}{\Theta}}}} + c_{x} - u_{i}}},{{dv}_{i} = {{f_{y}\frac{{E_{2}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{2}\overset{\sim}{\Theta}}}{{E_{3}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}\overset{\sim}{\Theta}}}} + c_{y} - v_{i}}}}{{{and}\mspace{14mu} E} = {\begin{bmatrix} E_{1}^{T} & E_{2}^{T} & E_{3}^{T} \end{bmatrix}^{T}.}}} & (29) \end{matrix}$

The optimal values of {tilde over (ω)} and {tilde over (Θ)} may be obtained by solving the following optimization problem:

$\begin{matrix} {\underset{\overset{\sim}{\omega},\overset{\sim}{\Theta}}{argmin}{\left\{ {\sum\limits_{i}^{\;}{\delta \; {I_{i}\left( {\overset{\sim}{\omega},\overset{\sim}{\Theta}} \right)}}} \right\}.}} & (30) \end{matrix}$

It can be shown from the equations (28)-(30) that the cost function in some embodiments is different from the intensity based cost function in traditional approaches. With the cost function of such embodiments, the estimation of optical flow (du_(i)*,dv_(i)*) and the estimation of ({tilde over (ω)},{tilde over (Θ)}) are separated into two independent steps by the algorithm of such embodiments rather than combining together as in the traditional approaches. The robustness of the algorithm of such embodiments is improved in this way because the optical flow (du_(i)*,dv_(i)*) can be obtained via a traditional robust method. This may be needed for out-door UAV system, especially in presence of lighting condition changes, moving subjects, and noise in the out-door environment.

Using the following Taylor expansion

$\begin{matrix} {{{{{du}_{i} = {{du}_{i}^{(0)} + {{\gamma_{1}^{T}\left( {u_{i},v_{i}} \right)}\left( {\overset{\sim}{\omega} - {\overset{\sim}{\omega}}^{(0)}} \right)} + {{\gamma_{2}^{T}\left( {u_{i},v_{i}} \right)}\left( {\overset{\sim}{\Theta} - {\overset{\sim}{\Theta}}^{(0)}} \right)}}}{{du}_{i}^{(0)} = {{f_{x}\frac{{E_{1}^{(0)}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{1}^{(0)}{\overset{\sim}{\Theta}}^{(0)}}}{{E_{3}^{(0)}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}^{(0)}{\overset{\sim}{\Theta}}^{(0)}}}} + c_{x} - u_{i}}}{\gamma_{1}\left( {u_{i},v_{i}} \right)}} = {\left. \frac{\partial{du}_{i}}{\partial\overset{\sim}{\omega}} \middle| {}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{(0)}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{(0)}}}{\gamma_{2}\left( {u_{i},v_{i}} \right)} \right. = \left. \frac{\partial{du}_{i}}{\partial\overset{\sim}{\Theta}} \right|_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{(0)}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{(0)}}}}},{{dv}_{i} = {{{dv}_{i}^{(0)} + {{\gamma_{3}^{T}\left( {u_{i},v_{i}} \right)}\left( {\overset{\sim}{\omega} - {\overset{\sim}{\omega}}^{(0)}} \right)} + {{\gamma_{4}^{T}\left( {u_{i},v_{i}} \right)}\left( {\overset{\sim}{\Theta} - {\overset{\sim}{\Theta}}^{(0)}} \right){dv}_{i}^{(0)}}} = {{{f_{y}\frac{{E_{2}^{(0)}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{2}^{(0)}{\overset{\sim}{\Theta}}^{(0)}}}{{E_{3}^{(0)}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}^{(0)}{\overset{\sim}{\Theta}}^{(0)}}}} + c_{y} - {v_{i}{\gamma_{3}\left( {u_{i},v_{i}} \right)}}} = {\left. \frac{\partial{dv}_{i}}{\partial\overset{\sim}{\omega}} \middle| {}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{(0)}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{(0)}}}{\gamma_{4}\left( {u_{i},v_{i}} \right)} \right. = \left. \frac{\partial{dv}_{i}}{{\partial\overset{\sim}{\Theta}}\omega} \right|_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{(0)}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{(0)}}}}}}}} & (31) \end{matrix}$

where the matching error in the equation (28) may be approximated as

δI _(i)({tilde over (ω)},{tilde over (Θ)})=(du _(i) ⁽⁰⁾+γ₁ ^(T)(u _(i) ,v _(i))({tilde over (ω)}−{tilde over (ω)}⁽⁰⁾)+γ₂ ^(T)(u _(i) ,v _(i))({tilde over (ω)}−{tilde over (ω)}⁽⁰⁾)−du _(i)*)²+(dv _(i) ⁽⁰⁾+γ₃ ^(T)(u _(i) ,v _(i))({tilde over (ω)}−{tilde over (ω)}⁽⁰⁾)+γ₄ ^(T)(u _(i) ,v _(i))({tilde over (ω)}−{tilde over (ω)}⁽⁰⁾)−dv _(i)*)²  (32)

The optimal values of {tilde over (ω)} and {tilde over (Θ)} are then computed as

$\begin{matrix} {{\begin{bmatrix} {\overset{\sim}{\omega}}^{*} \\ {\overset{\sim}{\Theta}}^{*} \end{bmatrix} = {\begin{bmatrix} {\overset{\sim}{\omega}}^{(0)} \\ {\overset{\sim}{\Theta}}^{(0)} \end{bmatrix} - {M^{- 1}ɛ}}},} & (33) \end{matrix}$

where the matrix M and the vector ε are defined as

$\begin{matrix} {{M = {\sum\limits_{i}\left( {{\begin{bmatrix} {\gamma_{1}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{2}\left( {u_{i},v_{i}} \right)} \end{bmatrix}\left\lbrack \begin{matrix} {\gamma_{1}^{T}\left( {u_{i},v_{i}} \right)} & {\gamma_{2}^{T}\left( {u_{i},v_{i}} \right)} \end{matrix} \right\rbrack} + {{\quad\left\lbrack \begin{matrix} {\gamma_{3}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{4}\left( {u_{i},v_{i}} \right)} \end{matrix} \right\rbrack\quad}\left\lbrack \begin{matrix} {\gamma_{3}^{T}\left( {u_{i},v_{i}} \right)} & {\gamma_{4}^{T}\left( {u_{i},v_{i}} \right)} \end{matrix} \right\rbrack}} \right)}}{ɛ = {\sum\limits_{i}{\left( {{\left( {{du}_{i}^{(0)} - {du}_{i}^{*}} \right)\begin{bmatrix} {\gamma_{1}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{2}\left( {u_{i},v_{i}} \right)} \end{bmatrix}} + {\left( {{dv}_{i}^{(0)} - {dv}_{i}^{*}} \right)\begin{bmatrix} {\gamma_{3}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{4}\left( {u_{i},v_{i}} \right)} \end{bmatrix}}} \right).}}}} & (34) \end{matrix}$

The optimal solution can also be obtained iteratively as follows:

$\begin{matrix} {{\begin{bmatrix} {\overset{\sim}{\omega}}^{{(j)},*} \\ {\overset{\sim}{\Theta}}^{{(j)},*} \end{bmatrix} = {\begin{bmatrix} {\overset{\sim}{\omega}}^{{({j - 1})},*} \\ {\overset{\sim}{\Theta}}^{{({j - 1})},*} \end{bmatrix} - {M^{- 1}ɛ_{j - 1}}}},} & (35) \end{matrix}$

where the vector ε_(j) is defined as

$\begin{matrix} {{ɛ_{j - 1} = {{\sum\limits_{i}{\left( {{du}_{i}^{{({j - 1})},*} - {du}_{i}^{*}} \right)\begin{bmatrix} {\gamma_{1}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{2}\left( {u_{i},v_{i}} \right)} \end{bmatrix}}} + {\left( {{dv}_{i}^{{({j - 1})},*} - {dv}_{i}^{*}} \right)\begin{bmatrix} {\gamma_{3}\left( {u_{i},v_{i}} \right)} \\ {\gamma_{4}\left( {u_{i},v_{i}} \right)} \end{bmatrix}}}}\mspace{20mu} {{\gamma_{1}\left( {u_{i},v_{i}} \right)} = {{\frac{\partial{du}_{i}}{\partial\overset{\sim}{\omega}}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{{({j - 1})},*}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{{({j - 1})},*}}}\mspace{20mu} {\gamma_{2}\left( {u_{i},v_{i}} \right)}} = {{\frac{\partial{du}_{i}}{\partial\overset{\sim}{\Theta}}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{{({j - 1})},*}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{{({j - 1})},*}}}\mspace{20mu} {\gamma_{3}\left( {u_{i},v_{i}} \right)}} = {{\frac{\partial{dv}_{i}}{\partial\overset{\sim}{\omega}}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{{({j - 1})},*}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{{({j - 1})},*}}}\mspace{20mu} {\gamma_{4}\left( {u_{i},v_{i}} \right)}} = {\frac{\partial{du}_{i}}{\partial\overset{\sim}{\Theta}}_{{\overset{\sim}{\omega} = {\overset{\sim}{\omega}}^{{({j - 1})},*}},{\overset{\sim}{\Theta} = {\overset{\sim}{\Theta}}^{{({j - 1})},*}}}}}}}}} & (36) \end{matrix}$

With the above derivation, the values of {tilde over (ω)} and {tilde over (Θ)} of the (m+1)th frame may be refined in detail as follows:

Step 1. Initialization.

{tilde over (ω)}^((0),*)={tilde over (ω)}⁽⁰⁾

{tilde over (Θ)}^((0),*)={tilde over (Θ)}⁽⁰⁾.

j=1  (37)

Step 2. Iteration. Update

$\begin{matrix} {{{du}_{i}^{(j)} = {{f_{x}\frac{{E_{1}^{{(j)},*}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{1}^{{(j)},*}{\overset{\sim}{\Theta}}^{{(j)},*}}}{{E_{3}^{{(j)},*}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}^{{(j)},*}{\overset{\sim}{\Theta}}^{{(j)},*}}}} + c_{x} - u_{i}}}{{{dv}_{i}^{(j)} = {{f_{y}\frac{{E_{2}^{{(j)},*}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{2}^{{(j)},*}{\overset{\sim}{\Theta}}^{{(j)},*}}}{{E_{3}^{{(j)},*}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}^{{(j)},*}{\overset{\sim}{\Theta}}^{{(j)},*}}}} + c_{y} - v_{i}}},}} & (38) \end{matrix}$

and compute the values of {tilde over (ω)}^((j+1),*) and {tilde over (Θ)}^((j+1),*) via the equations (35) and (36), and increase the value of j by 1. In some embodiments, the angular velocity and linear velocity of the (m+1)th frame may be refined/optimized at 106.

The estimated pose tends to drift away from the real position of the apparatus over time as small errors accumulate. To reduce drift for the current frame through the nearest key frame (as described above with reference to 110), a method of some embodiments corrects the values of {tilde over (ω)} and {tilde over (Θ)} of the (m+1)th frame through using the key frame. The objective for correcting the angular velocity and linear velocity of the kth frame is to reduce drifts in the rotation matrix and translation vector of the kth frame. It should be pointed out that the corrected values may not be accurate. Fortunately, the corrected angular velocity and linear velocity are only used to compute the rotation matrix and translation vector of the kth frame.

The initial values of the rotation matrix R(k_(m+1)) and the translation vector T(k_(m+1)) can be computed as

R ⁽⁰⁾(k _(m+1))=E*R(k _(m))

T ⁽⁰⁾(k _(m+1))=−E*{tilde over (Θ)}*.  (39)

The nearest key frame may be defined as a key frame that shares the largest possible captured region with the current (m+1)th frame. To reduce the computation cost, κ key frames with the smallest center distance may be selected. The center distance may be defined as

d _(c)(k _(m+1) ,r)=(X _(c)(k _(m+1))−X _(c)(r))²+(Y _(c)(k _(m+1))−Y _(c)(r))²+(Z _(c)(k _(m+1))−Z _(c)(r))²,  (40)

where the coordinate of the current camera center in the world reference frame is computed as

−R ^((0),T)(k _(m+i))T ⁽⁰⁾(k _(m+1)).

Among the selected κ key frames, the frame with the observation angle that is the most similar to that of the kth frame is chosen as the nearest key frame. After selecting the nearest reference/key frame, good feature points in the common captured region between the kth frame and the nearest reference frame are used to refine the pose of the kth frame. All the good common feature points may be detected as described below.

The coordinate of the top left pixel m_(i) ₀ =[0 0]^(T) of the kth image in the world reference frame is computed as

$\begin{matrix} {{{{\overset{\sim}{P}}_{i_{0}}\left( k_{m + 1} \right)} = \begin{bmatrix} {{- \frac{c_{x}}{f_{x}}}{{\overset{\sim}{Z}}_{i_{0}}\left( k_{m + 1} \right)}} & {{- \frac{c_{y}}{f_{y}}}{{\overset{\sim}{Z}}_{i_{0}}\left( k_{m + 1} \right)}} & {{\overset{\sim}{Z}}_{i_{0}}\left( k_{m + 1} \right)} \end{bmatrix}^{T}}{P_{i_{0}}\left( k_{m + 1} \right)} = {{R^{{(0)},T}\left( k_{m + 1} \right)}{\left( {{{\overset{\sim}{P}}_{i_{0}}\left( k_{m + 1} \right)} - {T^{(0)}\left( k_{m + 1} \right)}} \right).}}} & (41) \end{matrix}$

Similarly, the coordinates of the top right pixel m_(i) ₁ =[0 H]^(T), the bottom left pixel m_(i) ₂ =[W 0]^(T), and the bottom right pixel m_(i) ₃ =[W H]^(T) are computed.

The visible region of the (m+1)th frame may be determined by the coordinates of the four pixels m_(i) ₀ , m_(i) ₁ , m_(i) ₂ , m_(i) ₃ . A good feature point in the key frame is a common one if its coordinate in the world reference frame is in the visible region of the kth frame. Once all the good common feature points are available, the pose of the (m+1)th frame is refined.

For the matching of the ith feature point, an initial value may be computed as follows:

$\begin{matrix} {{\begin{bmatrix} {\overset{\sim}{X}}_{i} \\ {\overset{\sim}{Y}}_{i} \\ {\overset{\sim}{Z}}_{i} \end{bmatrix} = {{R^{(0)}\left( k_{m + 1} \right)}\begin{bmatrix} {X_{i} - {X_{c}\left( k_{m + 1} \right)}} \\ {Y_{i} - {Y_{c}\left( k_{m + 1} \right)}} \\ {Z_{i} - {Z_{c}\left( k_{m + 1} \right)}} \end{bmatrix}}}{u_{i}^{0} = {{f_{x}\frac{{\overset{\sim}{X}}_{i}}{{\overset{\sim}{Z}}_{i}}} + c_{u}}}{{v_{i}^{0} = {{f_{y}\frac{{\overset{\sim}{Y}}_{i}}{{\overset{\sim}{Z}}_{i}}} + c_{v}}},}} & (42) \end{matrix}$

where the value of [X_(i), Y_(i), Z_(i)]^(T) is computed by using the pixel [u_(i) v_(i) ]T as well as the pose and height of the nearest key frame. It can also be stored for the pixel [u_(i) v_(i)]^(T). The searching range is reduced by using the initial values of [u_(i) ⁰ v_(i) ⁰]^(T).

The optimization problem is formulated as

$\begin{matrix} {{\begin{bmatrix} u_{i}^{\prime,*} \\ v_{i}^{\prime,*} \end{bmatrix} = {\underset{{\lbrack\begin{matrix} u_{i}^{\prime} & v_{i}^{\prime} \end{matrix}\rbrack} \in {\Omega {({\lbrack\begin{matrix} u_{i}^{0} & v_{i}^{0} \end{matrix}\rbrack})}}}{argmin}\left\{ {\frac{1}{2}{\sum\limits_{h,w}{{{I_{k}\left( {{u_{i}^{\prime} + w},{v_{i}^{\prime} + h}} \right)} - {I_{r}\left( {{u_{i} + w},{v_{i} + h}} \right)}}}^{2}}} \right\}}},{\forall{i.}}} & (43) \end{matrix}$

Let the warping matrix be denoted as

$\begin{matrix} {{\overset{\sim}{\omega}(p)} = {\begin{bmatrix} {1 + p_{1}} & p_{2} & p_{3} \\ p_{4} & {1 + p_{5}} & p_{6} \end{bmatrix}.}} & (44) \end{matrix}$

The initial value of the warping matrix may be obtained by solving the following optimization problem:

$\begin{matrix} {\underset{\overset{\sim}{\omega}{(p)}}{\arg \mspace{11mu} \min}{\left\{ {\frac{1}{2}{\sum\limits_{i}{{{{\overset{\sim}{\omega}(p)}\begin{bmatrix} u_{i}^{0} \\ v_{i}^{0} \\ 1 \end{bmatrix}} - \begin{bmatrix} u_{i} \\ v_{i} \end{bmatrix}}}^{2}}} \right\}.}} & (45) \end{matrix}$

The optimal value is given as

$\begin{matrix} {\begin{bmatrix} p_{1}^{0} \\ p_{2}^{0} \\ p_{3}^{0} \end{bmatrix} = {{\left( {\sum\limits_{i}{\begin{bmatrix} u_{i}^{0} \\ v_{i}^{0} \\ 1 \end{bmatrix}\begin{bmatrix} u_{i}^{0} & v_{i}^{0} & 1 \end{bmatrix}}} \right)^{- 1}{\sum\limits_{i}{{\left( {u_{i} - u_{i}^{0}} \right)\begin{bmatrix} u_{i}^{0} \\ v_{i}^{0} \\ 1 \end{bmatrix}}\begin{bmatrix} p_{4}^{0} \\ p_{5}^{0} \\ p_{6}^{0} \end{bmatrix}}}} = {\left( {\sum\limits_{i}{\begin{bmatrix} u_{i}^{0} \\ v_{i}^{0} \\ 1 \end{bmatrix}\begin{bmatrix} u_{i}^{0} & v_{i}^{0} & 1 \end{bmatrix}}} \right)^{- 1}{\sum\limits_{i}{{\left( {v_{i} - v_{i}^{0}} \right)\begin{bmatrix} u_{i}^{0} \\ v_{i}^{0} \\ 1 \end{bmatrix}}.}}}}} & (46) \end{matrix}$

The optimization problem in the equation (43) may be solved by using the inverse compositional Lucas-Kanade algorithm. The optimal values are given as

$\begin{matrix} {{u_{i}^{\prime,*} = \frac{{\left( {1 + p_{5}^{*}} \right)\left( {u_{i} - p_{3}^{*}} \right)} - {p_{2}^{*}\left( {v_{i} - p_{6}^{*}} \right)}}{{\left( {1 + p_{1}^{*}} \right)\left( {1 + p_{5}^{*}} \right)} - {p_{2}^{*}p_{4}^{*}}}}{v_{i}^{\prime,*} = {\frac{{\left( {1 + p_{1}^{*}} \right)\left( {v_{i} - p_{6}^{*}} \right)} - {p_{4}^{*}\left( {u_{i} - p_{3}^{*}} \right)}}{{\left( {1 + p_{1}^{*}} \right)\left( {1 + p_{5}^{*}} \right)} - {p_{2}^{*}p_{4}^{*}}}.}}} & (47) \end{matrix}$

The feature correspondence that is established in the equation (43) could violate the epipolar constraint. The values of angular velocity and linear acceleration are corrected for the current frame by using the epipolar constraint, i.e., solving the following optimization problem:

$\begin{matrix} {\begin{bmatrix} {\overset{\sim}{\omega}}^{*} & {\overset{\sim}{\Theta}}^{*} \end{bmatrix}^{T} = {{\underset{\overset{\sim}{\omega},\overset{\sim}{\Theta}}{\arg \mspace{11mu} \min}{\left\{ {\frac{1}{2}{\sum\limits_{i}{\begin{bmatrix} {{\frac{u_{i}^{\prime,*} - c_{x}}{f_{x}}{{\overset{\sim}{Z}}_{i}\left( k_{m + 1} \right)}} - {{\overset{\sim}{X}}_{i}\left( k_{m + 1} \right)}} \\ {{\frac{v_{i}^{\prime,*} - c_{y}}{f_{y}}{{\overset{\sim}{Z}}_{i}\left( k_{m + 1} \right)}} - {{\overset{\sim}{Y}}_{i}\left( k_{m + 1} \right)}} \end{bmatrix}}^{2}}} \right\} \mspace{20mu}\begin{bmatrix} {{\overset{\sim}{X}}_{i}\left( k_{m + 1} \right)} \\ {{\overset{\sim}{Y}}_{i}\left( k_{m + 1} \right)} \\ {{\overset{\sim}{Z}}_{i}\left( k_{m + 1} \right)} \end{bmatrix}}} = {\begin{bmatrix} {{E_{1}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{1}\overset{\sim}{\Theta}}} \\ {{E_{2}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{2}\overset{\sim}{\Theta}}} \\ {{E_{3}{{\overset{\sim}{P}}_{i}\left( k_{m} \right)}} - {E_{3}\overset{\sim}{\Theta}}} \end{bmatrix}.}}} & (48) \end{matrix}$

Similar to the optimization problem in the equation (30), the above optimization problem may be solved using the first-order Taylor expansion. The pose of the (m+1)th frame may be refined as

R*(k _(m+1))=E*R(k _(m))

T*(k _(m+1))=−E*{tilde over (Θ)}*.  (49)

In some embodiments, the drift of the current frame may be corrected at 110.

Employing an inertial measurement unit as an additional sensor may dramatically improve both reliability and accuracy of traditional localization solutions. In another embodiment, a further model is developed for autonomous ground vehicles (AGVs). Motion pre-integration is then provided for the AGV using the model and the IMU data. The pre-integration may be used to determine initial values of optimized variables. Subsequently, a nonlinear optimization problem may be converted into a quadratic optimization problem. It is much easier to solve a quadratic optimization problem than a nonlinear one. Thus, the motion pre-integration of such embodiment is very helpful for the AGVs.

A wheeled AGV is supposed to take a planar motion. In this example, a further discrete dynamic model is derived for the AGV, once again, using the linear algebra and the linear control system theory.

Let V=[V₁ V₂]^(T) be the linear velocity in the world frame and R=[R₁ R₂] be the rotation matrix from the world frame to the body frame. Their dynamics is represented as

$\begin{matrix} {{\begin{bmatrix} \overset{.}{V} \\ \overset{.}{R_{1}} \\ \overset{.}{R_{2}} \end{bmatrix} = \begin{bmatrix} {R^{T}a} \\ {{- {{sk}(\omega)}}R_{1}} \\ {{- {{sk}(\omega)}}R_{2}} \end{bmatrix}},} & (50) \end{matrix}$

where a=[a₁ a₂]^(T) is the linear acceleration in the body frame that is measured with an accelerometer (e.g., in the IMU 120). ω is the angular velocity of the AGV that is measured by a gyroscope (e.g., in the IMU 120), and the matrix sk(ω) is defined as

$\begin{matrix} {{{sk}(\omega)} = {\begin{bmatrix} 0 & {- \omega} \\ \omega & 0 \end{bmatrix}.}} & (51) \end{matrix}$

Using the matrix theory, it can be derived that

$\begin{matrix} {{R^{T}a} = {\begin{bmatrix} {R_{1}^{T}a} \\ {R_{2}^{T}a} \end{bmatrix} = \begin{bmatrix} {a^{T}R_{1}} \\ {a^{T}R_{2}} \end{bmatrix}}} & (52) \end{matrix}$

If follows that

$\begin{matrix} {{\begin{bmatrix} \overset{.}{V} \\ {\overset{.}{R}}_{1} \\ {\overset{.}{R}}_{2} \end{bmatrix} = {{A\left( {a,\omega} \right)}\begin{bmatrix} V \\ R_{1} \\ R_{2} \end{bmatrix}}},} & (53) \end{matrix}$

where the matrix A(a, ω) is given by

$\begin{matrix} {{{A\left( {a,\omega} \right)} = \begin{bmatrix} 0 & {{\hat{A}}_{1}(a)} & {{\hat{A}}_{2}(a)} \\ 0 & {- {{sk}(\omega)}} & 0 \\ 0 & 0 & {- {{sk}(\omega)}} \end{bmatrix}},{{{\hat{A}}_{1}(a)} = \begin{bmatrix} a_{1} & a_{2} \\ 0 & 0 \end{bmatrix}},{{{\hat{A}}_{2}(a)} = {\begin{bmatrix} 0 & 0 \\ a_{1} & a_{2} \end{bmatrix}.}}} & (54) \end{matrix}$

When the values of a and ω are fixed, it can be easily derived that

$\begin{matrix} {{\exp^{A{({a,\omega})}} = \begin{bmatrix} I & {\overset{\sim}{A_{1}}(a)} & {{\overset{\sim}{A}}_{2}\left( {a,\omega} \right)} \\ 0 & {E(\omega)} & 0 \\ 0 & 0 & {E(\omega)} \end{bmatrix}}{{{{\overset{\sim}{A}}_{1}\left( {a,\omega} \right)} = \left\lbrack {\Gamma^{T}\mspace{14mu} (\omega)a\mspace{14mu} 0} \right\rbrack^{T}},}} & (55) \\ {{{{\overset{\sim}{A}}_{2}\left( {a,\omega} \right)} = \left\lbrack {0\mspace{14mu} {\Gamma \;}^{T}\mspace{11mu} (\omega)a} \right\rbrack^{T}},} & \; \end{matrix}$

where I is a 2×2 identity matrix, the matrix E(ω) is given as

$\begin{matrix} {{{E(\omega)} = {I - {\frac{\sin \; \omega}{\omega}{{sk}(\omega)}} + {\frac{1 - {\cos \; \omega}}{\omega^{2}}\left( {{sk}(\omega)} \right)^{2}}}},} & (56) \end{matrix}$

and the matrix Γ(ω) is given as

$\begin{matrix} {{\Gamma (\omega)} = {I + {\frac{{\cos \; \omega} - 1}{\omega^{2}}{{sk}(\omega)}} + {\frac{\omega - {\sin \; \omega}}{\omega^{3}}{\left( {{sk}(\omega)} \right)^{2}.}}}} & (57) \end{matrix}$

Considering two discrete time instance k and k+1 with the interval as ΔT, it follows from the linear control system theory that

$\begin{matrix} {{\begin{bmatrix} {V\left( {k + 1} \right)} \\ {R_{1}\left( {k + 1} \right)} \\ {R_{2}\left( {k + 1} \right)} \end{bmatrix} = {\exp^{A{({\overset{\sim}{d},\overset{\sim}{\omega}})}}\begin{bmatrix} {V(k)} \\ {R_{1}(k)} \\ {R_{2}(k)} \end{bmatrix}}},} & (58) \end{matrix}$

where ã and {tilde over (ω)} are ΔTa and ΔTω, respectively.

The dynamics of the AGV position is now addressed. Let P_(c).=[X_(c) Y_(c)] be the position. The dynamics of P_(c) is modelled as

{dot over (P)} _(c) =V.  (59)

Combining with the equation (4), it can be obtained that

$\begin{matrix} {{\begin{bmatrix} {\overset{.}{P}}_{c} \\ \overset{.}{V} \\ {\overset{.}{R}}_{1} \\ {\overset{.}{R}}_{2} \end{bmatrix} = {{B\left( {1,a,\omega} \right)}\begin{bmatrix} P_{c} \\ V \\ R_{1} \\ R_{2} \end{bmatrix}}},} & (60) \end{matrix}$

where the matrix B(s, a, ω) is given by

$\begin{matrix} {{B\left( {s,a,\omega} \right)} = \begin{bmatrix} 0 & {sI} & 0 & 0 \\ 0 & 0 & {{\hat{A}}_{1}(a)} & {{\hat{A}}_{2}(a)} \\ 0 & 0 & {- {{sk}(\omega)}} & 0 \\ 0 & 0 & 0 & {- {{sk}(\omega)}} \end{bmatrix}} & (61) \end{matrix}$

The model of some embodiments is now compared with a traditional model. Let θ(t) be the orientation of the AGV with respect to the world frame. The pose of the AGV (e.g., a mobile robot) is represented by the vector [X_(c)(t) Y_(c)(t) θ(t)]^(T) and the kinematics of the AGV are given as

$\begin{matrix} {{\begin{bmatrix} {{\overset{.}{X}}_{c}(t)} \\ {{\overset{.}{Y}}_{c}(t)} \\ \overset{.}{\theta (t)} \end{bmatrix} = {\begin{bmatrix} {\cos (\theta)} & 0 \\ {\sin \; (\theta)} & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} {v(t)} \\ {\omega (t)} \end{bmatrix}}},} & (62) \end{matrix}$

where v(t) and ω(t) are linear and angular velocities of the AGV, respectively. They are the control input of the system. The model in the equation (13) is a nonlinear system while the model in the equation (11) is a linear system.

Similarly, using the matrix theory, it can be derived that

$\begin{matrix} {{\exp^{B{({s,a,\omega})}} = \begin{bmatrix} I & {sI} & {{\overset{\sim}{B}}_{1}\left( {s,a,\omega} \right)} & {{\overset{\sim}{B}}_{2}\left( {s,a,\omega} \right)} \\ 0 & \; & \exp^{A{({a,\omega})}} & \; \end{bmatrix}},} & (63) \end{matrix}$

where the matrices {acute over (B)}_(i)(s,a,ω)(i=1, 2) are computed as

{tilde over (B)} ₁(s,a,ω)=[sΛ ^(T)(ω)a0]^(T),

{tilde over (B)} ₂(s,a,ω)=[0sΛ ^(T)(ω)a]^(T),  (64)

and the matrix Λ(ω) is given as

$\begin{matrix} {{\Lambda (\omega)} = {\frac{I}{2} + {\frac{{\sin \; \omega} - \omega}{\omega^{3}}{{sk}(\omega)}} + {\frac{{\cos \; \omega} - 1 + \frac{\omega^{2}}{2}}{\omega^{4}}{\left( {{sk}(\omega)} \right)^{2}.}}}} & (65) \end{matrix}$

Using the linear control theory, a new discrete model is provided as

$\begin{matrix} {\begin{bmatrix} {P_{c}\left( {k + 1} \right)} \\ {V\left( {k + 1} \right)} \\ {R_{1}\left( {k + 1} \right)} \\ {R_{2}\left( {k + 1} \right)} \end{bmatrix} = {{\exp^{B{({{\Delta \; T},\overset{\_}{a},\overset{\_}{\omega}})}}\begin{bmatrix} {P_{c}(k)} \\ {V(k)} \\ {R_{1}(k)} \\ {R_{2}(k)} \end{bmatrix}}.}} & (66) \end{matrix}$

Traditional motion pre-integration methods are based on approximation and they focused on unmanned aerial vehicles (UAVs). In this example, the model in the equation (66) may be used to pre-integrate all available data between two time instances. Without loss of generality, it is assumed that the interval length is nΔT and the values of a and ω are only fixed in the interval with the length ΔT.

Using the dynamic model in the equation (66), it can be derived that

R _(i)(k _(m)+1)=E({tilde over (ω)}(k _(m)))R _(i)(k _(m)),

V _(i)(k _(m)+1)=V _(i)(k _(m))+ã ^(T)(k _(m))Γ({tilde over (w)}(k _(m)))R _(i)(k _(m)),

P _(c,i)(k _(m)+1)=P _(c,i)(k _(m))+{tilde over (V)} _(i)(k _(m))+ΔTã(k _(m))Λ({tilde over (ω)}(k _(m)))R _(i)(k _(m)),  (67)

where k_(m), is mn, {tilde over (V)}_(i)(k_(m)) is ΔTV_(i)(k_(m)).

Subsequently, it can be derived that

$\begin{matrix} {{{R_{i}\left( k_{m + 1} \right)} = {\prod\limits_{j = 0}^{n - 1}\; {{E\left( {\overset{\sim}{\omega}\left( {k_{m} + j} \right)} \right)}{R_{i}\left( k_{m} \right)}}}},{{P_{c,j}\left( k_{m + 1} \right)} = {{P_{c,i}\left( k_{m} \right)} + {\Theta_{i}\left( k_{m} \right)}}},} & (68) \end{matrix}$

where Θ_(i)(k_(m)) is computed using the equation (18).

Using the Lie algebra, it can be shown that there exists an {circumflex over (ω)} such that

$\begin{matrix} {{E\left( \hat{\omega} \right)} = {\prod\limits_{j = 0}^{n - 1}\; {{E\left( {\overset{\sim}{\omega}\left( {k_{m} + j} \right)} \right)}.}}} & (69) \end{matrix}$

The motion pre-integration of some embodiments described above is useful for both path planning and localization. For example, the motion pre-integration of such embodiments may be used to compute initial values of optimized values in the localization. It is shown that a nonlinear optimization problem may be converted into a quadratic optimization problem if there exists a way to compute the initial values of optimized variables. It is easier to solve a quadratic optimization problem than a nonlinear optimization problem. Therefore, the motion pre-integration of some embodiments is useful for localization.

FIG. 2 is a flowchart 200 of a method of localization via visual inertial odometry. In some embodiments, operations of the method may include the operations described above with reference to FIG. 1. In some embodiments, the method may be performed by an apparatus (e.g., the apparatus 402/402′ shown in FIG. 4 or FIG. 5).

At 204, the apparatus may determine a first linear velocity of the apparatus and a first rotation matrix corresponding to a first video frame captured by a camera of the apparatus. In some embodiments, the operations performed at 204 may include the operations described above with reference to 102 in FIG. 1.

At 206, the apparatus may estimate a second linear velocity of the apparatus corresponding to a second video frame captured by the camera based on the first linear velocity, the first rotation matrix, and an angular velocity of the apparatus corresponding to the second video frame. The angular velocity may be provided by an inertial measurement unit of the apparatus. In some embodiments, the operations performed at 206 may include the operations described above with reference to 104 in FIG. 1.

At 208, the apparatus may construct an optical flow based on feature points across the first video frame and the second video frame.

At 210, the apparatus may refine the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity provided by the IMU. In some embodiments, the quadratic optimization problem is converted from a non-linear optimization problem because the initial value of the optimized variables is known (i.e., the angular velocity provided by the IMU, the estimated second linear velocity). In some embodiments, the operations performed at 208 and 210 may include the operations described above with reference to 106 in FIG. 1.

At 214, the apparatus may estimate the pose of the apparatus corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity. In some embodiments, the operations performed at 214 may include the operations described above with reference to 108 in FIG. 1.

At 216, the apparatus may optionally reduce the drift of the estimated pose corresponding to the second video frame using the nearest key frame of the second video frame. In some embodiments, the nearest key frame of the second video frame may be a key frame that shares the largest captured region with the second video frame. In some embodiments, the drift of the estimated pose corresponding to the second video frame may be reduced using the height of the apparatus corresponding to the second video frame. In some embodiments, the operations performed at 216 may include the operations described below with reference to FIG. 3. In some embodiments, the operations performed at 216 may include the operations described above with reference to 110 in FIG. 1.

FIG. 3 is a flowchart 300 of a method of reducing the drift of an estimated pose corresponding to a current video frame. In some embodiments, operations of the method may include the operations described above with reference to 216 in FIG. 2. In some embodiments, the method may be performed by an apparatus (e.g., the apparatus 402/40T shown in FIG. 4 or FIG. 5).

At 302, the apparatus may calculate initial values of the rotation matrix and the translation vector of the current video frame captured by a camera of the apparatus. In some embodiments, the initial value of the rotation matrix of the current video frame may be calculated based on the rotation matrix of a previous video frame. In some embodiments, the initial value of the translation vector of the current video frame may be calculated based on the refined linear velocity corresponding to the current video frame.

At 304, the apparatus may select the nearest key frame of the current video frame. In some embodiments, to select the nearest key frame, the apparatus may select a plurality of key frames with the smallest center distance to the current video frame, and select a key frame of the plurality of key frames with an observation angle that is most similar to that of the current video frame as the nearest key frame.

At 306, the apparatus may refine the pose corresponding to the current video frame using feature points in a common captured region between the current video frame and the nearest key frame. In some embodiments, to refine the pose corresponding to the current video frame, the apparatus may solve an optimization problem using inverse compositional Lucas-Kanade algorithm, and determine optimized values of the rotation matrix and the translation vector of the current video frame.

FIG. 4 is a conceptual data flow diagram 400 illustrating the data flow between different means/components in an exemplary apparatus 402. In some embodiments, the apparatus 402 may be an unmanned vehicle (e.g., UAV or AGV).

The apparatus 402 may include an initial estimation component 406 that estimates the initial values of the variables at least partially based on IMU data provided by the IMU 420. In one embodiment, the initial estimation component 406 may perform the operations described above with reference to 206 in FIG. 2.

The apparatus 402 may include a variable optimization component 408 that refines the initialized variables via solving an optimization problem. The initialized variables may be provided by the initial estimation component 406 and the IMU 420. In one embodiment, the variable optimization component 408 may perform the operations described above with reference to 208 or 210 in FIG. 2.

The apparatus 402 may include a pose estimation component 410 that estimates the initial pose of the apparatus corresponding to the current video frame captured by a camera of the apparatus. In some embodiments, the initial pose may be estimated based on the optimized variables provided by the variable optimization component 408 and the height of the apparatus provided by a height sensor 422. In some embodiments, the initial pose may be estimated further based on IMU data provided by the IMU 420. In one embodiment, the pose estimation component 410 may perform the operations described above with reference to 214 in FIG. 2.

The apparatus 402 may include a drift reduction component 412 that reduces the drift of the estimated pose using the nearest key frame. In some embodiments, the estimated pose may be corrected based on the height of the apparatus 402 provided by the height sensor 422. In one embodiment, the drift reduction component 412 may perform the operations described above with reference to 216 in FIG. 2.

The apparatus 402 may include additional components that perform each of the blocks of the algorithm in the aforementioned flowcharts of FIGS. 2 and 3. As such, each block in the aforementioned flowcharts of FIGS. 2 and 3 may be performed by a component and the apparatus may include one or more of those components. The components may be one or more hardware components specifically configured to carry out the stated processes/algorithm, implemented by a processor configured to perform the stated processes/algorithm, stored within a computer-readable medium for implementation by a processor, or some combination thereof.

FIG. 5 is a diagram 500 illustrating an example of a hardware implementation for an apparatus 402′ employing a processing system 514. In some embodiments, the apparatus 402′ may be the apparatus 402 described above with reference to FIG. 4. The apparatus 402′ may include one or more computing devices. The processing system 514 may be implemented with a bus architecture, represented generally by the bus 524. The bus 524 may include any number of interconnecting buses and bridges depending on the specific application of the processing system 514 and the overall design constraints. The bus 524 links together various circuits including one or more processors and/or hardware components, represented by the processor 504, the components 406, 408, 410, 412, the IMU 530, the sensors 532, the camera 534, the actuators 536, and the computer-readable medium/memory 506. The bus 524 may also link various other circuits such as timing sources, peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further.

The processing system 514 may be coupled to a transceiver 510. The transceiver 510 is coupled to one or more antennas 520. The transceiver 510 provides a means for communicating with various other apparatus over a transmission medium. The transceiver 510 receives a signal from the one or more antennas 520, extracts information from the received signal, and provides the extracted information to the processing system 514. In addition, the transceiver 510 receives information from the processing system 514, and based on the received information, generates a signal to be applied to the one or more antennas 520.

The processing system 514 includes a processor 504 coupled to a computer-readable medium/memory 506. The processor 504 is responsible for general processing, including the execution of software stored on the computer-readable medium/memory 506. The software, when executed by the processor 504, causes the processing system 514 to perform the various functions described supra for any particular apparatus. The computer-readable medium/memory 506 may also be used for storing data that is manipulated by the processor 504 when executing software. The processing system 514 further includes at least one of the components 406, 408, 410, 412. The components may be software components running in the processor 504, resident/stored in the computer readable medium/memory 506, one or more hardware components coupled to the processor 504, or some combination thereof.

The IMU 530 may measure and report the angular velocity and linear acceleration of the apparatus 402′. The sensors 532 may include a height sensor and some other sensors. The camera 534 may capture images or video frames, which can be analyzed in localization. The actuators 536 may include digital electronic speed controllers (which control the RPM of the motors) linked to motors/engines and propellers, servomotors (for planes and helicopters mostly), weapons, payload actuators, LEDs and speakers.

In the following, various aspects of this disclosure will be illustrated:

Example 1 is a method or apparatus for localization via visual inertial odometry. The apparatus may be an unmanned vehicle. The apparatus may determine a first linear velocity of the apparatus and a first rotation matrix corresponding to a first video frame captured by a camera of the apparatus. The apparatus may estimate a second linear velocity of the apparatus corresponding to a second video frame captured by the camera based on the first linear velocity, the first rotation matrix, and the angular velocity of the apparatus corresponding to the second video frame. The angular velocity may be provided by an inertial measurement unit of the apparatus. The apparatus may construct an optical flow based on feature points across the first video frame and the second video frame. The apparatus may refine the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity provided by the inertial measurement unit. The apparatus may estimate a pose of the apparatus corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity.

In Example 2, the subject matter of Example 1 may optionally include that the apparatus may further reduce the drift of the estimated pose corresponding to the second video frame using the nearest key frame of the second video frame.

In Example 3, the subject matter of Example 2 may optionally include that the nearest key frame of the second video frame may be a key frame that shares the largest captured region with the second video frame.

In Example 4, the subject matter of any one of Examples 2 to 3 may optionally include that, to reduce the drift of the estimated pose, the apparatus may calculate initial values of a second rotation matrix and translation vector of the second video frame, select the nearest key frame of the second video frame, and refine the pose corresponding to the second video frame using feature points in a common captured region between the second video frame and the nearest key frame.

In Example 5, the subject matter of Example 4 may optionally include that the initial value of the second rotation matrix of the second video frame may be calculated based on the first rotation matrix of the first video frame.

In Example 6, the subject matter of any one of Examples 4 to 5 may optionally include that the initial value of the translation vector of the second video frame may be calculated based on the refined second linear velocity corresponding to the second video frame.

In Example 7, the subject matter of any one of Examples 4 to 6 may optionally include that, to select the nearest key frame, the apparatus may select a plurality of key frames with the smallest center distance to the second video frame, and select a key frame of the plurality of key frames with an observation angle that is most similar to that of the second video frame as the nearest key frame.

In Example 8, the subject matter of any one of Examples 4 to 7 may optionally include that, to refine the pose corresponding to the second video frame, the apparatus may solve an optimization problem using inverse compositional Lucas-Kanade algorithm, and determine optimized values of the second rotation matrix and the translation vector of the second video frame.

In Example 9, the subject matter of any one of Examples 2 to 8 may optionally include that the drift of the estimated pose corresponding to the second video frame may be reduced using the height of the apparatus corresponding to the second video frame.

In Example 10, the subject matter of any one of Examples 1 to 9 may optionally include that the second linear velocity may be estimated further based on the linear acceleration of the unmanned vehicle corresponding to the second video frame, the linear acceleration provided by the inertial measurement unit of the unmanned vehicle, where the linear acceleration and the angular velocity may be assumed to be constant.

A person skilled in the art will appreciate that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is understood that the specific order or hierarchy of blocks in the processes/flowcharts disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes/flowcharts may be rearranged. Further, some blocks may be combined or omitted. The accompanying method claims present elements of the various blocks in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term “some” refers to one or more. Combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words “module,” “mechanism,” “element,” “device,” and the like may not be a substitute for the word “means.” As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase “means for.” 

What is claimed is:
 1. A method of localization via visual inertial odometry, the method comprising: determining a first linear velocity of an unmanned vehicle and a first rotation matrix corresponding to a first video frame captured by a camera of the unmanned vehicle; estimating a second linear velocity of the unmanned vehicle corresponding to a second video frame captured by the camera based on the first linear velocity, the first rotation matrix, and an angular velocity of the unmanned vehicle corresponding to the second video frame, the angular velocity provided by an inertial measurement unit of the unmanned vehicle; constructing an optical flow based on feature points across the first video frame and the second video frame; refining the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity provided by the inertial measurement unit; and estimating a pose of the unmanned vehicle corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity.
 2. The method of claim 1, further comprising: reducing a drift of the estimated pose corresponding to the second video frame using a nearest key frame of the second video frame.
 3. The method of claim 2, wherein the nearest key frame of the second video frame is a key frame that shares largest captured region with the second video frame.
 4. The method of claim 2, wherein the reducing of the drift of the estimated pose comprises: calculating initial values of a second rotation matrix and translation vector of the second video frame; selecting the nearest key frame of the second video frame; and refining the pose corresponding to the second video frame using feature points in a common captured region between the second video frame and the nearest key frame.
 5. The method of claim 4, wherein the initial value of the second rotation matrix of the second video frame is calculated based on the first rotation matrix of the first video frame.
 6. The method of claim 4, wherein the initial value of the translation vector of the second video frame is calculated based on the refined second linear velocity corresponding to the second video frame.
 7. The method of claim 4, wherein the selecting of the nearest key frame comprises: selecting a plurality of key frames with the smallest center distance to the second video frame; and selecting a key frame of the plurality of key frames with an observation angle that is most similar to that of the second video frame as the nearest key frame.
 8. The method of claim 4, wherein the refining of the pose corresponding to the second video frame comprises: solving an optimization problem using inverse compositional Lucas-Kanade algorithm; and determining optimized values of the second rotation matrix and the translation vector of the second video frame.
 9. The method of claim 2, wherein the drift of the estimated pose corresponding to the second video frame is reduced using a height of the unmanned vehicle corresponding to the second video frame.
 10. The method of claim 1, wherein the second linear velocity is estimated further based on a linear acceleration of the unmanned vehicle corresponding to the second video frame, the linear acceleration provided by the inertial measurement unit of the unmanned vehicle, wherein the linear acceleration and the angular velocity are assumed to be constant.
 11. An apparatus for localization via visual inertial odometry, the apparatus comprising: a camera configured to capture a first video frame and a second video frame; an inertial measurement unit configured to measure an angular velocity of the apparatus corresponding to the second video frame; and at least one processor configured to: determine a first linear velocity of the apparatus and a first rotation matrix corresponding to the first video frame; estimate a second linear velocity of the apparatus corresponding to the second video frame based on the first linear velocity, the first rotation matrix, and the angular velocity; construct an optical flow based on feature points across the first video frame and the second video frame; refine the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity; and estimate a pose of the apparatus corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity.
 12. The apparatus of claim 11, wherein the at least one processor is further configured to: reduce a drift of the estimated pose corresponding to the second video frame using a nearest key frame of the second video frame.
 13. The apparatus of claim 12, wherein the nearest key frame of the second video frame is a key frame that shares largest captured region with the second video frame.
 14. The apparatus of claim 12, wherein, to reduce the drift of the estimated pose, the at least one processor is configured to: calculate initial values of a second rotation matrix and translation vector of the second video frame; select the nearest key frame of the second video frame; and refine the pose corresponding to the second video frame using feature points in a common captured region between the second video frame and the nearest key frame.
 15. The apparatus of claim 14, wherein the initial value of the second rotation matrix of the second video frame is calculated based on the first rotation matrix of the first video frame.
 16. The apparatus of claim 14, wherein the initial value of the translation vector of the second video frame is calculated based on the refined second linear velocity corresponding to the second video frame.
 17. The apparatus of claim 14, wherein, to select the nearest key frame, the at least one processor is configured to: select a plurality of key frames with the smallest center distance to the second video frame; and select a key frame of the plurality of key frames with an observation angle that is most similar to that of the second video frame as the nearest key frame.
 18. The apparatus of claim 14, wherein, to refine the pose corresponding to the second video frame, the at least one processor is configured to: solve an optimization problem using inverse compositional Lucas-Kanade algorithm; and determine optimized values of the second rotation matrix and the translation vector of the second video frame.
 19. A non-transitory computer-readable medium storing computer executable code, comprising instructions for: determining a first linear velocity of an unmanned vehicle and a first rotation matrix corresponding to a first video frame captured by a camera of the unmanned vehicle; estimating a second linear velocity of the unmanned vehicle corresponding to a second video frame captured by the camera based on the first linear velocity, the first rotation matrix, and an angular velocity of the unmanned vehicle corresponding to the second video frame, the angular velocity provided by an inertial measurement unit of the unmanned vehicle; constructing an optical flow based on feature points across the first video frame and the second video frame; refining the angular velocity and the second linear velocity via solving a quadratic optimization problem constructed based on the optical flow, the estimated second linear velocity, and the angular velocity provided by the inertial measurement unit; and estimating a pose of the unmanned vehicle corresponding to the second video frame based on the refined angular velocity and the refined second linear velocity.
 20. The non-transitory computer-readable medium of claim 19, wherein the computer executable code further comprises instructions for: reducing a drift of the estimated pose corresponding to the second video frame using a nearest key frame of the second video frame. 